KRSA Vignette

library(KRSA)
library(knitr)
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
#> ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
#> ✓ tibble  3.0.6     ✓ dplyr   1.0.4
#> ✓ tidyr   1.1.2     ✓ stringr 1.4.0
#> ✓ readr   1.4.0     ✓ forcats 0.5.0
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()

library(gt) # can be used to view tables
library(furrr) # can be used for parallel computing 
#> Loading required package: future

Introduction

Background

The Pamstation12 instrument provides a profiling of kinase activity of cell or tissue samples. The device is loaded with either serine/threonine or tyrosine microarray chips. Each chip has 4 wells so four samples can be loaded on a single chip, and the Pamstation12 can accommodate 3 chips per run. The microarray represents 144 (STK chip) or 196 (PTK chip) reporter peptides that can be phosphorylated by serine/threonine or tyrosine kinases. The device measures the degree of the phosphorylation in real time by detecting fluorescently labeled antibodies at different exposure times. The list of peptides present in each microarray can be viewed here: STK chip, PTK chip

Run Design

Designing the placement of the samples on the chips and arrays is important to consider due to the variability across different chips and batches. During the run some wells are subject to fail and their data cannot be analyzed and shown below as red.

Results

Image Analysis

The first step of analyzing the run is to convert the images taken by the PamStation of each array at different exposure times to numerical values This is done by the Bionavigator software developed by Pamgene. The software recognizes the grid of the array with the aid of the searching algorithm (Pamgrid) to correctly identify each spot on the array. The numbers produced by this software represent the median value of the foreground pixels minus the median value of the background pixels to produce the median signal minus background (Median_SigmBg).

Reading Data

The first step will be reading the crosstab view bionavigator files (Median_SigmBg and Signal_Saturation) and defining the PamChip type (STK or PTK). The raw data is read and then transformed to be in tidy format for an easier analysis, modeling, and visualizing.

QC Initial Steps and Groups Assignments

We will perform a couple of quality control steps to deal with negative values in the data and adjust based on signal saturation (optional). Next, we will define a new column to represent the grouping. And then, we will extract end point signal values

QC Steps and Model Fitting

We will filter out peptides with low signals. In order to combine the values from different exposure times into a single value, a simple linear regression model of the Medain_SigmBg as a function of exposure time is fitted. The slope of of the model fit and \(R^2\) are then used for quality control and samples comparison. The slope is also scaled by multiplying by 100 and log2 transformed (Slope_Transformed). We then filter out peptides with poor linear fit and references peptides.

Global Signal Intensity

For a global signal intensity across all samples/groups, few figures can be plotted based on the Slope_Transformed values.

Global CV Plots

We will plot the coefficient of variation on both the normal and normalized fits. This will help us to identify groups with high variation that could be explained by sample outliers.


# Plot a CV figure using the modeled scaled data
krsa_cv_plot(data_modeled$scaled, new_pep)
#> `geom_smooth()` using formula 'y ~ x'


# Plot a CV figure using the modeled normalized data
krsa_cv_plot(data_modeled$normalized, new_pep)
#> `geom_smooth()` using formula 'y ~ x'

Global Violin Plots

We will plot violin figures to examine global signal differences between groups/samples.

# Plot a violin figure and facet by the (Group) variable
krsa_violin_plot(data_modeled$scaled, new_pep, "Group")


# Plot a violin figure and facet by the (Barcode) variable
krsa_violin_plot(data_modeled$scaled, new_pep, "Barcode")

Global Heatmaps

The heatmap represent all the peptides present on the chip except the positive/internal controls and peptides that failed to pass QC. The heatmaps are scaled by row to highlight the peptide signal differences across the samples. A hierarchical unsupervised clustering is applied both on the peptides and the samples to potentially group similar signatures.


# Generates a heatmap using the modeled scaled data
krsa_heatmap(data_modeled$scaled, new_pep, scale = "row")


# Generates a heatmap using the modeled normalized data
krsa_heatmap(data_modeled$normalized, new_pep, scale = "row")


# Generates a heatmap using the modeled grouped data
krsa_heatmap_grouped(data_modeled$grouped, new_pep, scale = "row")

Group Comparison

To compare between samples, a two-group comparison is performed. In this case, the two group comparisons are:

The Slope_Transforemed ratio between each group, paired by chip, is calculated to the fold change. Based on the fold change, peptides that pass a certain fold change threshold are considered significant hits. Also, quality control steps applied in each comparison to filter out peptides that do not reach specific criteria:

These Filtering Parameters (fold change threshold, QC criteria) can be modified to adjust the stringency of the analysis. The Filtering Parameters that are used for this analysis:


# Define Groups to be compared
comparisons <- list(Comp1 = c("B", "A"))

# This function will run both QC steps (krsa_filter_lowPeps, krsa_filter_nonLinear) and krsa_filter_ref_pep
krsa_quick_filter(data = data_pw_200, data2 = data_modeled$scaled,
    signal_threshold = 5, r2_threshold = 0.9,
    groups = comparisons$Comp1) -> pep_passed_qc
#> Filtered out 65 Peptides
#> Filtered out 1 Peptides
#> Filtered out 10 Peptides

# This function calculates log2 fold change values between the defined groups
# The byChip argument lets you calculates the log2 fold change the results within each chip
krsa_group_diff(data_modeled$scaled, comparisons$Comp1, pep_passed_qc, byChip = T) -> diff_df

# Extract top peptides based on the LFC cutoff
krsa_get_diff(diff_df,totalMeanLFC ,c(0.2,0.3)) %>% list("meanLFC" = .) -> sigPeps
krsa_get_diff_byChip(diff_df, LFC, c(0.2,0.3)) -> sigPepsPerChip

# Combine the peptides hits in one list
sigPeps_total <- list(sigPeps, sigPepsPerChip) %>% unlist(recursive = F) %>%  unlist(recursive = F)

B vs A (Heatmap)

After applying the Filtering Parameters for this group comparison, only 46/141 peptides carried forward in the analysis (i.e. 46 hits). Below are some figures to visualize the differences between these samples for considering these hits.


# generates a heatmap using the selected groups and peptides
krsa_heatmap(data_modeled$normalized, sigPeps$meanLFC$`0.2`, groups = comparisons$Comp1, scale = "row")

B vs A (Violin Plot)

# generates a violin plot using the selected groups and peptides
krsa_violin_plot(data_modeled$scaled, sigPeps$meanLFC$`0.2`, "Barcode", groups = comparisons$Comp1)


# generate a grouped violin/boxplot plot using the selected groups and peptides with more options, like a statistical test. check krsa_violin_plot_grouped() arguments for more options

krsa_violin_plot_grouped(data_modeled$scaled, sigPeps$meanLFC$`0.2`, comparisons, 
                         byGroup = F, dots = F,
                         groups = comparisons$Comp1, avg_line = T)
#> Warning: Ignoring unknown parameters: byGroup


krsa_violin_plot_grouped(data_modeled$grouped, sigPeps$meanLFC$`0.2`, comparisons, 
                         groups = comparisons$Comp1, avg_line = T)

B vs A (Waterfall Plot)

This waterfall represents the log2 fold changes between the two groups at each peptide.

# generates a waterfall of the log2 fold change values for the selected peptide (top peptides)
krsa_waterfall(diff_df, 0.2, byChip = T)

B vs A (Curve Plot)

We generate curves plots per peptide using the last cycle data to visualize the linear fit model

# generates a curve plot representing the linear model fit for the selected peptide (top peptides)
krsa_curve_plot(data_pw, sigPeps$meanLFC$`0.2`, groups = comparisons$Comp1)

B vs A (Upstream Kinase Analysis)

The lab carefully curated and mapped the kinases that can act and phosphorylate each peptide present on the chip. This was achieved by using multiple sources including GPS 3.0, Kinexus Phosphonet, PhosphoELM and PhosphoSite Plus. Based on that association between peptides and kinases, a random sampling analysis is performed for these hits. The basic idea of KRSA is: For each iteration (2000 iterations performed in this analysis), the same number of hits are randomly selected from the total 141/or 193 peptides present on the chip. Predicted kinases are then mapped to this sample list of peptides and number of kinases are determined. The kinase count from the actual hits and random sampling is then compared to determine the significance.



# load in chip coverage and kinase-substrate files OR upload your own files
# if PTK chip, use:
# chipCov <- KRSA_coverage_PTK_PamChip_86402_v1
# KRSA_file <- KRSA_Mapping_PTK_PamChip_86402_v1

# STK chip
chipCov <- KRSA_coverage_STK_PamChip_87102_v1
KRSA_file <- KRSA_Mapping_STK_PamChip_87102_v1



# run the KRSA function to do the random sampling analysis, set seed that can be used later to reproduce results, and choose number of iterations
krsa(sigPeps$meanLFC$`0.2`, return_count = T, seed = 123, itr = 2000,
     map_file = KRSA_file, cov_file = chipCov) -> fin

# View the Z score table
kable(head(fin$KRSA_Table,25), digits = 3)
Kinase Observed SamplingAvg SD Z
DYRK 7 17.605 2.733 -3.880
JNK 8 15.700 2.668 -2.887
PIM 31 23.086 2.781 2.846
ERK 12 19.294 2.743 -2.659
DMPK 40 33.616 2.447 2.609
NMO 0 3.545 1.530 -2.317
PLK 9 15.007 2.693 -2.231
STE7 11 16.708 2.640 -2.162
RSK 33 27.064 2.749 2.159
PKCH 9 5.230 1.779 2.120
PAKB 2 0.653 0.656 2.052
AMPK 12 7.800 2.063 2.036
PDHK 26 20.416 2.800 1.994
PKA 25 19.550 2.792 1.952
AKT 18 13.084 2.556 1.924
CDK 28 32.669 2.586 -1.805
P38 12 16.988 2.774 -1.798
RAD53 15 10.700 2.391 1.798
RIPK 8 12.364 2.487 -1.755
PKG 20 15.655 2.580 1.684
MLCK 0 1.641 1.026 -1.599
STE11 1 3.294 1.493 -1.537
KHS 1 0.310 0.463 1.490
TAO 1 0.321 0.467 1.454
WNK 1 0.330 0.470 1.426

# to save file
#fin$KRSA_Table %>% write_delim("acrossChip_KRSA_FullTable_comp1.txt", delim = "\t")

# find top and bottom kinases
bothways <- c(pull(head(fin$KRSA_Table, 10), Kinase), pull(tail(fin$KRSA_Table, 10), Kinase))

# Use these kinase to generate histogram plots for each selected kinase
krsa_histogram_plot(fin$KRSA_Table, fin$count_mtx, bothways)



# For parallel computing, load the furrr package:
# opens multiple R sessions to run faster
plan(multisession)

# Run the KRSA function across the different sets of peptides using the furrr package for parallel computing
future_map(sigPeps_total, krsa) -> mutiple_krsa_outputs


# For none parallel computing:
# Run KRSA function across the different sets of peptides
#map(sigPeps_total, krsa) -> mutiple_krsa_outputs

# Tidy output
df <- data.frame(matrix(unlist(mutiple_krsa_outputs), ncol = max(lengths(mutiple_krsa_outputs)), byrow = TRUE))
df <- setNames(do.call(rbind.data.frame, mutiple_krsa_outputs), names(mutiple_krsa_outputs$meanLFC.0.2))

df %>% rownames_to_column("method") %>% select(Kinase, Z, method) %>% 
  mutate(method = str_extract(method, "\\w+\\.\\w+\\.\\w+")) %>% 
  mutate(method = gsub("(^\\w+)[\\.]", "\\1>", method)) %>% 
  mutate_if(is.numeric, round, 2) -> df

df %>% 
  pivot_wider(names_from = method, values_from = Z) -> df2
  
# Generates a table of the Z scores across the different sets of peptides
# df2 %>% 
#   gt() %>% tab_spanner_delim(delim = ">")

# Creates an average Z score table using the across chip analysis
df %>% 
  filter(grepl("mean", method)) %>% 
  select(Kinase, Z, method) %>% group_by(Kinase) %>% mutate(AvgZ = mean(Z)) -> AvgZTable

# save file
#AvgZTable %>% write_delim("acrossChip_KRSA_Table_comp1.txt", delim = "\t")

# Creates an average Z score table using the within chip analysis
df %>% 
  filter(!grepl("mean", method)) %>% 
  select(Kinase, Z, method) %>% group_by(Kinase) %>% mutate(AvgZ = mean(Z)) -> AvgZTable2

# save file
#AvgZTable %>% write_delim("withinChip_KRSA_Table_comp1.txt", delim = "\t")

# Extract top kinases based on abs(Z) score
krsa_top_hits(AvgZTable2, 2) -> kinases_hits

#krsa_top_hits(AvgZTable2, 1.75)
#krsa_top_hits(AvgZTable2, 1.5)

# Show the number of peptides per each set in atable
krsa_show_peptides(sigPeps_total)
#> # A tibble: 8 x 2
#>   Method        NumberOfPeptides
#>   <chr>                    <int>
#> 1 meanLFC.0.2                 46
#> 2 meanLFC.0.3                 40
#> 3 710317402.0.2               43
#> 4 710317402.0.3               39
#> 5 710317404.0.2               51
#> 6 710317404.0.3               46
#> 7 710317405.0.2               49
#> 8 710317405.0.3               45

B vs A (Z Scores Plot)

We will plot the individual and averaged Z scores using both the across and within chip analyses.


# Generates Z scores waterfall plots
krsa_zscores_plot(AvgZTable)

krsa_zscores_plot(AvgZTable2)

B vs A (Reverse KRSA Plot)

We will use the reverse KRSA plot function, to plot the log2 fold chnage values for all peptides mapped to kinase hits. This will help us examine the activity of the kinase


# plot the reverse KRSA figure for top kinases to determine their activity levels
krsa_reverse_krsa_plot(chipCov, diff_df, kinases_hits, 0.2, byChip = F)

B vs A (Coverage Plot)

To view the coverage of kinases across the full list of peptides on the chip, we will use the coverage plot function


# generates a kinase covergae plot 
krsa_coverage_plot(chipCov, AvgZTable2, chipType)

B vs A (Ball Model Network)

We will view the ball model network function, to generate a model representing the protein-protein interactions between kinases


# Plot the network ball model
krsa_ball_model(kinases_hits, AvgZTable2, 10, 2.5, 4.8)

Session Info

installed.packages()[names(sessionInfo()$otherPkgs), "Version"]
#>     furrr    future        gt   forcats   stringr     dplyr     purrr     readr 
#>   "0.2.1"  "1.21.0"   "0.2.2"   "0.5.0"   "1.4.0"   "1.0.4"   "0.3.4"   "1.4.0" 
#>     tidyr    tibble   ggplot2 tidyverse     knitr      KRSA 
#>   "1.1.2"   "3.0.6"   "3.3.3"   "1.3.0"    "1.30"   "0.9.3"